A cascaded classification approach to disambiguating polysemous mentions with social chains

نویسندگان

  • Yu-Chuan Wei
  • Ming-Shun Lin
  • Hsin-Hsi Chen
چکیده

This paper considers five features including titles, community chains, terms, temporal expressions, and hostnames for personal name disambiguation. In nine test data sets covering three ambiguous personal names, we address the issues of awareness degree of an entity, the source of materials and web pages in different areas. In a single-clusterer approach, employing all features achieve average F-score 0.635, which is better than employing contextual terms only 0.502. When community chains are expanded by using the web, the average F-score is increased to 0.676. We also propose a multiple-clusterer approach, which cascades five clusterers corresponding to the five features. The average F-score is further improved to 0.684. Expanding community chains also enhances the average F-score of the multiple-clusterer approach to 0.697. In summary, the proposed features are quite useful; the cascaded multiple-clusterer approach is better than the single-clusterer approach; and expanding community chains using the web has positive effects on personal name disambiguation. The experiments show that this approach has significant improvements. 2010 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Processing and Representation of Ambiguous Words in Chinese Reading: Evidence from Eye Movements

In the current study, we used eye tracking to investigate whether senses of polysemous words and meanings of homonymous words are represented and processed similarly or differently in Chinese reading. Readers read sentences containing target words which was either homonymous words or polysemous words. The contexts of text preceding the target words were manipulated to bias the participants towa...

متن کامل

Polysemy in Sentence Comprehension: Effects of Meaning Dominance.

Words like church are polysemous, having two related senses (a building and an organization). Three experiments investigated how polysemous senses are represented and processed during sentence comprehension. On one view, readers retrieve an underspecified, core meaning, which is later specified more fully with contextual information. On another view, readers retrieve one or more specific senses...

متن کامل

A cascaded approach to normalising gene mentions in biomedical literature

Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gen...

متن کامل

A Maximum Entropy Approach To Disambiguating VerbNet Classes

This paper focuses on verb sense disambiguation cast as inferring the VerbNet class to which a verb belongs. To train three different supervised learning models –Maximum Entropy (MaxEnt), Naive Bayes and Decision Tree– we used lexical, co-occurrence and typed-dependency features. For each model, we built three classifiers: one single classifier for all verbs, one single classifier for polysemou...

متن کامل

Arabic Cross-Document Person Name Normalization

This paper presents a machine learning approach based on an SVM classifier coupled with preprocessing rules for crossdocument named entity normalization. The classifier uses lexical, orthographic, phonetic, and morphological features. The process involves disambiguating different entities with shared name mentions and normalizing identical entities with different name mentions. In evaluating th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2010